This is the first notebook in this project.
This notebook details how I looked at my interest in aviation data.
Flight operation data were obtained from United States Department of Transportation.
Now that the data is imported, I want to state the scope, data limitations, and objectives:
# Filters for flights departing
LAX_outbound_flight_data <- raw_flight_data %>%
filter(., ORIGIN_AIRPORT_ID %in% LAX_airport)
The following table summarises some of the most frequently flown routes originating from LAX in the first six months of 2019.
flight_destinations_by_airport <- LAX_outbound_flight_data %>%
group_by(ORIGIN, DEST) %>%
summarise(number_of_flights = n(),
mean_distance = mean(DISTANCE)) %>%
arrange(desc(number_of_flights))
datatable(flight_destinations_by_airport)
There is another way to look at it. We know that New York City is served by three airports and it is not the only major city to be served by more than one airport. If we think about it that way,
flight_destinations_by_city <- LAX_outbound_flight_data %>%
group_by(ORIGIN, DEST_CITY_NAME) %>%
summarise(number_of_flights = n(),
mean_distance = round(mean(DISTANCE),2)) %>%
arrange(desc(number_of_flights))
datatable(flight_destinations_by_city)
Or by what the Department of Transporation defines as a destination city market.
A random thought: Can destination city markets be represented as Thiessen polygons and the distance to the nearest airport be mapped/calculated?
flight_destinations_by_market <- LAX_outbound_flight_data %>%
group_by(ORIGIN, DEST_CITY_MARKET_ID) %>%
summarise(number_of_flights = n(),
mean_distance = round(mean(DISTANCE),2)) %>%
arrange(desc(number_of_flights)) %>%
inner_join(.,city_market, by = c("DEST_CITY_MARKET_ID" = "Code"))
datatable(flight_destinations_by_market)
The distribution can be represented as such.
frequency_histogram <- ggplot(flight_destinations_by_market, aes(number_of_flights)) +
geom_histogram(binwidth = 200)
ggplotly(frequency_histogram)
Another question: Just because Los Angeles is on the West Coast, does it mean that it mainly serves West Coast and American Southwest destinations?
frequency_distance_scatterplot <- ggplot(flight_destinations_by_market, aes(x = mean_distance, y = number_of_flights)) +
geom_point()
ggplotly(frequency_distance_scatterplot)
Apparently not. But what if we look at the spatial distribution of destinations connected to Los Angeles at the state level?
It appears that states adjacent to California tend to receive more flights than those farther away. However there are some exceptions such as IL, GA, FL, NY, and HI. With the exception of HI, IL, NY, GA, and FL are home to the hubs of the three main US carriers. HI’s high degree of connectivity with LAX is due to its geographical isolation from any other state bar California. Hence, the four main US carriers (Alaska, American, United, and Delta) and Hawaiian use LAX as a gateway to Hawaii due to the high numbers of feeder flights from across the United States into each of the carriers’ hub terminals at LAX.
# Create map
# Aggregate flights by state
lax_destinations_state <- LAX_outbound_flight_data %>%
group_by(DEST_STATE_ABR, MONTH) %>%
summarise(monthly_sum_flights = n()) %>%
group_by(DEST_STATE_ABR) %>%
summarise(mean_monthly_flights = round(mean(monthly_sum_flights)))
# Super important: Even though imported shp behave like dfs, merging with a df using anything but merge() from sp returns a df NOT shp
lax_destinations_state_shp <- merge(US_state_map, lax_destinations_state, by.y = "DEST_STATE_ABR", by.x = "STUSPS", all.x = T)
lax_destination_state_map_layer <- tm_shape(lax_destinations_state_shp) +
tm_polygons(col = "mean_monthly_flights", border.col = NA, palette = "viridis") +
tm_text("STUSPS")
lax_destinations_state_leaflet <- tmap_leaflet(x = lax_destination_state_map_layer)
lax_destinations_state_leaflet